Search CORE

17 research outputs found

Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

Author: Caubriere Antoine
De Mori Renato
Esteve Yannick
Raymond Christian
Tomashenko Natalia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/02/2020
Field of study

This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in the form of dialog history embedding vectors (so-called h-vectors) and is provided as an additional information to end-to-end SLU models in order to improve the system performance. Three following types of h-vectors are proposed and experimentally evaluated in this paper: (1) supervised-all embeddings predicting bag-of-concepts expected in the answer of the user from the last dialog system response; (2) supervised-freq embeddings focusing on predicting only a selected set of semantic concept (corresponding to the most frequent errors in our experiments); and (3) unsupervised embeddings. Experiments on the MEDIA corpus for the semantic slot filling task demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019

arXiv.org e-Print Archive

Crossref

HAL Descartes

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Dinarelli Marco
Esteve Yannick
Evain Solene
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Ringeval Fabien
Rossato Solange
Schwab Didier
Tomashenko Natalia
Tong Ziyi
Publication venue
Publication date: 10/06/2021
Field of study

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.Comment: Will be presented at Interspeech 202

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Coavoux Maximin
Dinarelli Marco
Esteve Yannick
Evain Solene
Goulian Jerome
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Pupier Adrien
Ringeval Fabien
Rossato Solange
Rouvier Mickael
Schwab Didier
Tomashenko Natalia
Zhang Shucong
Publication venue
Publication date: 11/09/2023
Field of study

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

arXiv.org e-Print Archive

Intégration de sources de connaissances pour la modélisation stochastique du langage appliquée à la parole continue dans un contexte de dialogue oral homme-machine

Author: BECHET Frédéric
DE MORI Renato
ESTEVE Yannick
Publication venue
Publication date: 01/01/2002
Field of study

AVIGNON-BU Centrale (840072102) / SudocNANCY-INRIA Lorraine LORIA (545472304) / SudocSudocFranceF

OpenGrey Repository

Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Author: Benjamin Lecouteux
Georges Linares
Guillaume Gravier
Yannick Esteve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Stochastic Finite State Automata Language Model Triggered by Dialogue States

Author: Alexis Nasr
Frederic Bechet
Renato De Mori
Yannick Esteve
Publication venue
Publication date
Field of study

Within the framework of Natural Spoken Dialogue systems, this paper describes a method for dynamically adapting a Language Model (LM) to the dialogue states detected. This LM combines a standard n-gram model with Stochastic Finite State Automata (SFSAs). During the training process, the sentence corpus used to train the LM is split into several hierarchical clusters in a 2-step process which involves both explicit knowledge and statistical criteria. All the clusters are stored in a binary tree where the whole corpus is attached to the root node. Each level of the tree corresponds to a higher specialization of the sub-corpora attached to the nodes and each node corresponds to a different dialogue state. From the same sentence corpus, SFSAs are extracted in order to model longer contexts than the ones used in the standard n-gram model. A set of SFSAs is attached to each node of the tree as well as a sub-LM which combines a bigram trained on the sub-corpus of the node and the SFSAs selected. A first decoding process calculates a word-graph as well as a first sentence hypothesis. This first hypothesis will be used to find the optimal node in the LM tree. Then, a rescoring process of the word graph using the LM attached to the node selected is performed. By adapting the LM to the dialogue state detected, we show a statistically significant gain in WER on a dialogue corpus collected by France Telecom R&D

CiteSeerX